Thirty Days of Metal — Day 23: Interaction

Warren Moore
10 min readMay 3, 2022

This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.

If you want to work through this series in order, start here. To download the sample code for this article, go here.

What fun is a dense asteroid field in the middle of space if you can’t fly through it? In this article, we will learn the basics of interaction. We will learn how to map mouse movement and keyboard presses in macOS to camera movement, and we will also get familiar with the GameController framework and learn how to simulate a game controller on iOS even if we don’t have a physical controller.

Camera Controllers

So far, to establish a point of view for our virtual camera, we have set a fixed position and orientation. Implementing an interactive camera consists of computing a new position and orientation for the camera when the user provides input.

A camera controller is an object that has the responsibility of turning input events into camera transform updates. There are many possible camera controller schemes which can be selected from depending on the needs of the application. A few are described below.

  • Turntable: The camera points toward the center of the scene, which can be rotated about its vertical axis, as if it were sitting on a turntable or potter’s wheel.
  • Arcball: The scene is enclosed in an invisible sphere (like a hamster ball), and rotates about an axis perpendicular to the direction of drag gestures.
  • Pan: The camera moves in the XY (vertical) plane at a constant distance
  • Truck: The camera moves in the XZ plane at a constant height
  • Fly: The camera moves freely in 3D space, often with the keyboard controlling position and the mouse controlling orientation.

In the sections below we will discuss how to implement a fly camera so we can fly around the instanced asteroid field we made last time.

Fly Cameras

A fly camera can have up to six degrees of freedom: movement along the X, Y, and Z axes, and rotation around each axis. Rotation around the X, Y, and Z axes is called pitch, yaw, and roll, respectively. Other texts use different a different order of axes, but the motion is the same.

The fly camera controller we will implement will only have four degrees of freedom: we will not allow the camera to roll around its local Z axis, nor will we allow translational movement along the local Y axis. Even with these limitations, our camera controller will allow us to navigate quite naturally in 3D space.

We will implement the logic for our fly camera controller in the FlyCameraController class. The controller will hold a reference to a node, called its point of view, that represents the virtual camera:

class FlyCameraController {
let pointOfView: Node

//…
init(pointOfView: Node) {
self.pointOfView = pointOfView
}
//…
}

The purpose of the camera controller is to turn input events into camera motion. The controller stores a few properties that it composes into the camera transform. The eye member stores the camera position, the lookmember stores the forward direction vector, and the up member stores the current up direction vector. It is no accident that these members look a lot like the parameters to the lookAt transform initializer.

private var eye = SIMD3<Float>(0, 0, 0)
private var look = SIMD3<Float>(0, 0, -1)
private var up = SIMD3<Float>(0, 1, 0)

We also need a few constants that affect camera behavior. invertYLook is a boolean that controls whether the pitch angle is inverted before rotation is performed, a common option in video games. eyeSpeed holds the speed, in units per second, at which the camera moves. radiansPerLookPoint holds the number of radians subtended by one screen point, which determines rotation speed.

let invertYLook = false
let eyeSpeed: Float = 6
let radiansPerLookPoint: Float = 0.017

The central method of the camera controller is update(timestep:lookDelta:moveDelta:):

func update(timestep: Float, 
lookDelta: SIMD2<Float>,
moveDelta: SIMD2<Float>)

This method is agnostic to input method: we can map keyboard and mouse events, touch gestures, or game controller inputs to these parameters. We will see below how to do so on macOS and iOS.

The update method has three phases: update the position, update the orientation, and assign the new transformation to the point of view.

To find the new position, we first construct normalized vectors pointing in the X and Z directions, since the camera only moves in its XZ plane. We call these directions right and forward:

let right = normalize(cross(look, up))
var forward = look

We then find the camera’s movement direction by multiplying the movement inputs by their respective directions:

let deltaX = moveDelta[0], deltaZ = moveDelta[1]
let movementDir = SIMD3<Float>(
deltaX * right.x + deltaZ * forward.x,
deltaX * right.y + deltaZ * forward.y,
deltaX * right.z + deltaZ * forward.z)

Finally, we update the position by multiplying this movement direction by the speed and timestep factors:

eye += movementDir * eyeSpeed * timestep

To update the orientation, we perform two rotations on the forward direction: one around the up vector (yaw), and one around the right vector (pitch). Rather than using matrix multiplication, we will exploit the power of quaternions to perform these rotations efficiently.

To find the yaw rotation, we scale the X look input by the rotational speed factor, then construct a quaternion that represents a rotation by this angle around the Y axis:

let yaw = -lookDelta.x * radiansPerLookPoint
let yawRotation = simd_quaternion(yaw, up)

Finding the pitch rotation is similar, but we optionally apply look-inversion:

var pitch = lookDelta.y * radiansPerLookPoint
if (invertYLook) { pitch *= -1.0 }
let pitchRotation = simd_quaternion(pitch, right)

We form the combined rotation by multiplying the quaternions together, then update the forward vector by rotating it to the new bearing:

let rotation = simd_mul(pitchRotation, yawRotation)
forward = rotation.rotate(forward)

Since this procedure has potentially made our forward direction nonorthogonal with the up and right directions, we normalize the forward vector and reconstruct a new up vector:

look = normalize(forward)
up = cross(right, look)

After this sequence of operations, we are assured that our right, up, and forward vectors form an orthonormal set. Now we simply call on our look-at initializer to construct the camera’s new transformation matrix:

pointOfView.transform = simd_float4x4(
lookAt: eye + look, from: eye, up: up)

That completes the camera controller logic, but how do we determine the input parameters of the update method? Below, we will talk about the different input modes we have available on macOS and iOS.

Mouse Input on macOS

To determine how far the mouse has moved between updates, we will add two member variables to our view controller, which will drive our camera controller updates:

var previousMousePoint = CGPoint.zero
var currentMousePoint = CGPoint.zero

We are mostly concerned with drag events, which occur when the mouse is moved while its button is pressed. We implement three methods from the NSResponder class to receive events when the button is pressed, when the mouse is dragged, and when the button is released:

override func mouseDown(with event: NSEvent) {
let mouseLocation = self.view.convert(event.locationInWindow,
from: nil)
currentMousePoint = mouseLocation
previousMousePoint = mouseLocation
}
override func mouseDragged(with event: NSEvent) {
let mouseLocation = self.view.convert(event.locationInWindow,
from: nil)
previousMousePoint = currentMousePoint
currentMousePoint = mouseLocation
}
override func mouseUp(with event: NSEvent) {
let mouseLocation = self.view.convert(event.locationInWindow,
from: nil)
previousMousePoint = mouseLocation
currentMousePoint = mouseLocation
}

In each event responder method, we convert the mouse location into the 2D coordinates of our view, then update our member variables to track the mouse over time.

Keyboard Input on macOS

Receiving keyboard events is slightly more involved, since our view controller must be added to the responder chain in order to get keypresses. We implement two methods in our view controller to enable this: when the view appears, we ask the window to make the view controller the first responder, then we respond affirmatively when the window system asks us if we want to be first responder:

override func viewDidAppear() {
view.window?.makeFirstResponder(self)
}
override func becomeFirstResponder() -> Bool {
return true
}

To keep track of which keys are pressed at a given moment, we store an array of booleans that hold the pressed state of each possible key code:

var keysPressed = [Bool](repeating: false, 
count: Int(UInt16.max))

When a key is pressed or released, we receive the event with two additional responder methods:

override func keyDown(with event: NSEvent) {
keysPressed[Int(event.keyCode)] = true
}
override func keyUp(with event: NSEvent) {
keysPressed[Int(event.keyCode)] = false
}

With the mouse and key events handled, we can now turn to how we map them to our camera controller inputs.

Mapping Keyboard and Mouse Inputs

To update our camera on a regular cadence, we create a timer that fires at our expected frame-rate of 60Hz. Each time the timer fires, we call our view controller’s updateCamera(:) method.

let frameDuration = 1.0 / 
Double(mtkView.preferredFramesPerSecond)
Timer.scheduledTimer(withTimeInterval: frameDuration,
repeats: true)
{ [weak self] _ in
self?.updateCamera(Float(frameDuration))
}

In updateCamera(:), we use the mouse and key state accumulated since the previous frame. First, we find the distance the mouse moved, then save the current mouse position as the previous mouse position.

let cursorDeltaX = Float(currentMousePoint.x -
previousMousePoint.x)
let cursorDeltaY = Float(currentMousePoint.y -
previousMousePoint.y)
previousMousePoint = currentMousePoint
let mouseDelta = SIMD2<Float>(cursorDeltaX, cursorDeltaY)

We will use the traditional first-person shooter key layout (“WASD”) to move the camera forward, backward, left, and right.

let forwardPressed = keysPressed[VirtualKey.ANSI_W.rawValue]
let backwardPressed = keysPressed[VirtualKey.ANSI_S.rawValue]
let leftPressed = keysPressed[VirtualKey.ANSI_A.rawValue]
let rightPressed = keysPressed[VirtualKey.ANSI_D.rawValue]

We find the net motion by adding up the influences of the pressed keys.

let deltaX: Float = (leftPressed ? -1.0 : 0.0) + 
(rightPressed ? 1.0 : 0.0)
let deltaZ: Float = (backwardPressed ? -1.0 : 0.0) +
(forwardPressed ? 1.0 : 0.0)
let keyDelta = SIMD2<Float>(deltaX, deltaZ)

Then, we call the camera controller’s update method with our look and movement vectors:

cameraController.update(timestep: timestep,
lookDelta: mouseDelta,
moveDelta: keyDelta)

This completes the logic for mouse and keyboard control on macOS. Next, we will discuss how to use the GameController framework to handle physical and virtual game controller inputs.

The GameController Framework

The GameController framework is a library that allows us to easily receive input from a huge variety of game input devices, from the Apple TV Siri Remote to the Sony DualSense controller. These various devices are abstracted over by the GCController class.

To use the GameController framework, we import its module at the top of our view controller’s implementation file

import GameController

To receive notifications when controllers are connected and disconnected, we register ourselves with the default notification center.

private func registerControllerObservers() {
NotificationCenter.default.addObserver(
forName: NSNotification.Name.GCControllerDidConnect,
object: nil,
queue: nil)
{ [weak self] notification in
if let controller = notification.object as? GCController {
self?.controllerDidConnect(controller)
}
}
NotificationCenter.default.addObserver(
forName: NSNotification.Name.GCControllerDidDisconnect,
object: nil,
queue: nil)
{ [weak self] notification in
if let controller = notification.object as? GCController {
self?.controllerDidDisconnect(controller)
}
}
}

We also store a reference to the connected controller, if any:

var gameController: GCController?

We can now listen to game controller connection and disconnection notifications and respond appropriately:

private func controllerDidConnect(_ controller: GCController) {
gameController = controller
}
private func controllerDidDisconnect(_ controller: GCController) {
gameController = nil
}

When we have a game controller connected on macOS, we bypass the keyboard and mouse logic and instead defer to the controller input. A GCController object has an optional member of type GCExtendedGamepadthat gives us access to the various buttons and directional controls on the controller.

To update the camera controller with gamepad input, we first ask the controller for its gamepad:

if let gamepad = gameController?.extendedGamepad {

An extended gamepad has built-in support for multiple directional joysticks: these are represented by the leftThumbstick and rightThumbstick members. Each thumbstick has X and Y axes that smoothly vary based on how far the thumbstick is pressed in each direction. We map the left thumbstick to camera movement and the right thumbstick to camera orientation:

let lookX = gamepad.rightThumbstick.xAxis.value
let lookZ = gamepad.rightThumbstick.yAxis.value
let lookDelta = SIMD2<Float>(lookX, lookZ)
let moveZ = gamepad.leftThumbstick.yAxis.value
let moveDelta = SIMD2<Float>(0, moveZ)

Then, as we did with our keyboard and mouse input, we update the camera controller with our mapped values:

cameraController.update(timestep: timestep,
lookDelta: lookDelta,
moveDelta: moveDelta)

This is all fine, but what if we are running on iOS and don’t have a physical game controller? It turns out that the GameController framework once again comes to the rescue.

Virtual Controllers on iOS

Starting with iOS 15, the GameController framework includes the GCVirtualController class, a flexible means of defining a virtual gamepad that renders on the screen and generates gamepad events from multitouch inputs.

To create and connect a virtual controller, we first make a GCVirtualController.Configuration object. This allows us to request which sticks and buttons we want our virtual controller to have. iOS determines how to lay out and render each input element. In our case, we just want two virtual thumbsticks:

let controllerConfig = GCVirtualController.Configuration()
controllerConfig.elements = [
GCInputLeftThumbstick,
GCInputRightThumbstick,
]

With our controller configuration specified, we can create and connect a virtual controller:

let controller = GCVirtualController(
configuration: controllerConfig)
controller.connect()

This virtual controller looks to the rest of the system just like a physical controller. As soon as we call connect(), the virtual controller appears on-screen, and a connection notification is sent to our view controller, allowing us to poll for controller status exactly as if we’d plugged in a physical controller.

Here is what the virtual thumbsticks look like on an iPhone:

For more information, see the WWDC 2021 talk on game controllers.

This concludes our exploration of camera controllers and input mapping. Next time we will look at a notoriously hard problem — transparency — and some solutions for it.

Warren Moore

Real-time graphics engineer based in San Francisco, CA.